POLS 2972Q

Quantitative Analysis in Political Science

Lecture 8 | Data Visualization II

Plan for Today

  • Continue discussing plots in ggplot
  • Examine Bar graphs
  • Examine editing labels and themes
  • In class activity

Plots in R

The Anatomy of ggplot

  • Standard:
ggplot(data = <DATA>) + 
        <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))


  • Recall (again…)

    • ggplot() is the plot function
      • Creates a coordinate system that you can add layers to
    • (Data =) is where you specify the dataset that you are using
    • geom_point() is an added layer, which specifies how the data will be plotted
    • mapping defines how variables in your dataset are mapped to visual properties
    • aes specify which variables to map to the x and y axes
  • Using pipes:
<DATA> %>%
        ggplot() + 
        <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

Plotting Functions

ggplot Cheatsheet

Aesthetics

  • Color / Fill / Alpha (transparency)
  • Size
  • Shape
  • Points / Lines / Text

Faceting

  • Faceting creates subplots that each display one subset of the data.
    • useful for categorical variables
  • For a single variable, use facet_wrap(~ <VARIABLE NAME>)
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class)) + 
  facet_wrap(~ class, nrow = 2)

  • For a combination of two variables, use `facet_grid( ~ )
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class)) + 
  facet_grid(drv ~ cyl)

Geometric Objects

  • Geoms
    • A geom is the geometrical object that a plot uses to represent data
    • People often describe plots by the type of geom that the plot uses
    • Every geom function in ggplot takes a mapping argument
      • Not every aesthetic works with every geom
        • You could set the shape of a point, but you couldn’t set the “shape” of a line
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy))

Multiple Geoms

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  geom_smooth(mapping = aes(x = displ, y = hwy))

  • What do you notice about the code above?
  • How could this code be more efficient?
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

  • What is the difference between these two code chunks?
    • Global vs local mapping
    • By passing a set of mappings to ggplot(), this will treat the mapping as global and apply to EACH geom
    • By passing a set of mappings to geom, this will treat the mapping as local and apply to ONLY that geom
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth()

Other ggplot functions

  • coord_flip flips the x and y axis to improve the readability of plots
  • scales change the formatting of x and y axes
  • plotly makes plots interactive; you can hover over points/lines for more information
  • labs allows you to add/edit a title, subtitle, a caption, and change the x and y axis labels
  • gganimate allows you to animate plots into gifs

Barplot

  • Bar plots show the relationship between a numeric and categorical variable
  • Bar charts, histograms, and frequency polygons bin your data and then plot bin counts
library(tidyverse)

diamonds
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows
?diamonds
  • Let’s build a bar graph that examines the different types of “cut” in the data.
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

  • What do we notice about this plot?
    • Where did “count” come from?

stat_count()

  • Many graphs, like scatterplots, plot the raw values of your dataset. Other graphs, like bar charts, calculate new values to plot. How does ggplot() compute this?
  • geom_bar() utilizes stat_count() as the default way to make statistical transformations for bar graphs
?geom_bar
  • You can generally use geoms and stats interchangeably.
  • Every geom has a default stat
  • Every stat has a default geom
  • This means that you can typically use geoms without worrying about the underlying statistical transformation
ggplot(data = diamonds) + 
  stat_count(mapping = aes(x = cut))

  • Most of the time, we are not going to be concerned with changing stat

  • stat = "identity"

demo <- tribble(
  ~cut,         ~freq,
  "Fair",       1610,
  "Good",       4906,
  "Very Good",  12082,
  "Premium",    13791,
  "Ideal",      21551
)

ggplot(data = demo) +
  geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

  • y = stat(prop), group = 1
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))

Color and Position Adjustments

  • Color
    • You can use color =
    • You can use fill =
  • Which one is more useful?
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, color = cut))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut))

  • How does the color aesthetic change if we use the variable “clarity”
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity))

  • Position
    • The stacking from the plot above is performed automatically when position = is left unchanged
    • If you don’t want a stacked bar chart, you can use one of three other options
      • “identity”
      • “dodge”
      • “fill”
  • Identity
    • Place each object exactly where it falls in the context of the graph
    • Overlaps the bars
    • Not useful for bar graphs
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(position = "identity")

ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "identity")

  • Fill
    • Like stacking, but makes each set of stacked bars the same height
    • Easier to compare proportions across groups
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

  • Dodge
    • Places overlapping objects directly beside one another
    • Easier to compare individual values
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

Graphics for Communication

  • When exploring data, yourself, you are aware of the data/graph that you have created
  • When creating graphs for explanation, you need to ensure that your audience understands the graph

Labels

  • Title
  • x - axis
  • y - axis
#adding a title
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") + 
  labs(title = "Cut of Diamond by Clarity")

#adding a x-axis
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") + 
  labs(title = "Cut of Diamond by Clarity",
       x = "Diamond Cut")

#adding a y-axis
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") + 
  labs(title = "Cut of Diamond by Clarity",
       x = "Diamond Cut",
       y = "Number of Diamonds")

Scales

  • Adjusting the scales allows you to control the range of the x and y-axis
  • `ggplot() automatically adds default scales behind the scenes
#change y-axis scale
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") + 
  labs(title = "Cut of Diamond by Clarity",
       x = "Diamond Cut",
       y = "Number of Diamonds") +
  scale_y_continuous(breaks = seq(0, 6000, by = 500), limits = c(0, 6000)) 

#change x-axis scale (labels)
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") + 
  labs(title = "Cut of Diamond by Clarity",
       x = "Diamond Cut",
       y = "Number of Diamonds") +
  scale_y_continuous(breaks = seq(0, 6000, by = 500), limits = c(0, 6000)) +
  scale_x_discrete(labels = c("eh", "ok", "better", "wow", "hey now"))

Themes

  • Default theme has a grey background
  • Not the most visually pleasing
#default theme
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

#theme_bw()
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") +
  theme_bw()

#theme_bw() with grid removed
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") +
  scale_y_continuous(breaks = seq(0, 6000, by = 500), limits = c(0, 6000)) +
  theme_bw() +
  theme(axis.line = element_line(color='black'),
    plot.background = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_blank())

#theme_bw() with origin at 0,0
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") +
  scale_y_continuous(breaks = seq(0, 6000, by = 500), limits = c(0, 6000), expand = c(0,0)) +
  theme_bw() +
  theme(axis.line = element_line(color='black'),
    plot.background = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_blank())

In-Class Activity

  • We are going to practice producing and altering ggplot()

  • Get with a partner and work through Week 3 | Class Activity

For Next Class

  • Review what you learned today

  • Do new set of readings:

    • We are going to discuss how we transform data in R with Tidyverse
    • R4DS Chapter 3
      • Follow along the exercises with your own computer
  • Bring your laptops to class